{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy import stats\n",
    "from scipy.stats import norm\n",
    "from __future__ import print_function\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analysing One Sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(282629734)\n",
    "x = stats.t.rvs(df=10, size=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Descriptive Statistics\n",
    "\n",
    "How do the some sample properties compare to their theoretical counterparts?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "m, v, s, k = stats.t.stats(df=10, moments='mvsk')  # Theoretical Counterparts\n",
    "n, (smin, smax), sm, sv, ss, sk = stats.describe(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0 \t 1.25 \t 0.0 \t 1.0\n",
      "0.007760379167583533 \t 1.0639650855972986 \t 0.0027585736657464812 \t 0.20389024339467765\n"
     ]
    }
   ],
   "source": [
    "print(m, '\\t', v, '\\t', s, '\\t', k)\n",
    "print(sm, '\\t', sv, '\\t', ss, '\\t', sk)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### T-test and KS-test\n",
    "\n",
    "We can use t-test to test whether **the mean of sample differs** in a statistically significant way from the theoretical expectation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.23791359136285903 0.8119969000496687\n"
     ]
    }
   ],
   "source": [
    "value = stats.ttest_1samp(x, m)\n",
    "print(value.statistic, value.pvalue)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparing Two Samples\n",
    "\n",
    "In the following, we are given two samples, which can come either from the same or from different distribution, and we want to test **whether they two have the same statistical properties**.\n",
    "\n",
    "### Comparing means\n",
    "\n",
    "**T检验用于判断两样本是否有显著性差别**\n",
    "\n",
    "**ttest_ind**的零假设：两者均值相同"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ttest_indResult(statistic=0.8771673121577234, pvalue=0.3806068672181312)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Samples with identical means\n",
    "rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)\n",
    "rvs2 = stats.norm.rvs(loc=5, scale=10, size=500)\n",
    "stats.ttest_ind(rvs1, rvs2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ttest_indResult(statistic=-3.8568417151087644, pvalue=0.00012222309536958636)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Samples with different means\n",
    "rvs3 = stats.norm.rvs(loc=8, scale=10, size=500)\n",
    "stats.ttest_ind(rvs1, rvs3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于rvs1与rvs2，p值较高，无法拒绝原假设；\n",
    "\n",
    "对于rvs1与rvs3，p值小于0.05，则拒绝原假设，即**两者均值不相同，有显著性差异**。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Kolmogorov-Smirnov test for two samples ks_2samp\n",
    "\n",
    "**KS检验用于判断两样本是否来自同一分布**\n",
    "\n",
    "**ks_2samp**的零假设：两者同分布"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ks_2sampResult(statistic=0.03799999999999998, pvalue=0.8567045353837416)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stats.ks_2samp(rvs1, rvs2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ks_2sampResult(statistic=0.12400000000000005, pvalue=0.0008097301144065585)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stats.ks_2samp(rvs1, rvs3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于rvs1, rvs2, p值较高，无法拒绝原假设，则两者同分布；\n",
    "\n",
    "对于rvs1, rvs3，p值小于0.05，则拒绝原假设，即两者同分布！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
